Clustering Using Monte Carlo Cross-Validation

نویسنده

  • Padhraic Smyth
چکیده

Finding the “right” number of clusters, Ic, for a data set is a difficult, and often ill-posed, problem. In a probabilistic clustering context, likelihood-ratios, penalized likelihoods, and Bayesian techniques are among the more popular techniques. In this paper a new cross-validated likelihood criterion is investigated for determining cluster structure. A practical clustering algorithm based on Monte Carlo crossvalidation (MCCV) is introduced. The algorithm permits the data analyst to judge if there is strong evidence for a particular Ic, or perhaps weaker evidence over a sub-range of lc values. Experimental results with Gaussian mixtures on real and simulated data suggest that MCCV provides genuine insight into cluster structure. v-fold cross-validation appears inferior to the penalized likelihood method (BIC), a Bayesian algorithm (AutoClass v2.0), and the new MCCV algorithm. Overall, MCCV and AutoClass appear the most reliable of the methods. MCCV provides the da&miner with a useful data-driven clustering tool which complements the fully Bayesian approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weighted rank aggregation of cluster validation measures: a Monte Carlo cross-entropy approach

MOTIVATION Biologists often employ clustering techniques in the explorative phase of microarray data analysis to discover relevant biological groupings. Given the availability of numerous clustering algorithms in the machine-learning literature, an user might want to select one that performs the best for his/her data set or application. While various validation measures have been proposed over ...

متن کامل

Siemens primus accelerator simulation using EGSnrc Monte Carlo code and gel dosimetry validation with optical computed tomography system by EGSnrc code

Monte Carlo method is the most accurate method for simulation of radiation therapy equipment. The linear accelerators (linac) are currently the most widely used machines in radiation therapy centers. Monte Carlo modeling of the Siemens Primus linear accelerator in 6 MeV beams was used. Square field size of 10 × 10 cm2 produced by the jaws was compared with TLD. Head simulation of Siemens accele...

متن کامل

An Empirical Study on the Visual Cluster Validation Method with Fastmap

This paper presents an empirical study on the visual method for cluster validation based on the Fastmap projection. The visual cluster validation method attempts to tackle two clustering problems in data mining: ( I ) to veri f y partitions of data created by a clustering algorithm and ( 2 ) to identify genuine clusters from data partitions. They are achieved through projecting objects and clus...

متن کامل

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

In the mixed modeling framework, Monte Carlo simulation and cross validation are employed to develop an “improved” Akaike information criterion, AICi, and the predictive divergence criterion, PDC, respectively, for model selection. The selection and the estimation performance of the criteria is investigated in a simulation study. Our simulation results demonstrate that PDC outperforms AIC and A...

متن کامل

A Mixture model with random-effects components for clustering correlated gene-expression profiles

MOTIVATION The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996